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^ Abstract 

This paper studies the problem of finding a (1 + e) -approximation to positive semidefinite programs. These 
are semidefinite programs in which all matrices in the constraints and objective are positive semidefinite and 
all scalars are non-negative. Previous work (Jain, Yao, FOCS'll) gave an NC algorithm that requires at least 
0( jY3 log 13 n) iterations. The algorithm performs at least fi(n w ) work per iteration, where n is the dimension 
of the matrices involved, since each iteration involves computing spectral decomposition. 

We present a simpler NC parallel algorithm that requires 0(\ log 4 n log( -)) iterations. Moreover, if the 
positive SDP is provided in a factorized form, the total work of our algorithm can be bounded by 0(n + q), 
where q is the total number of nonzero entries in the factorization. Our algorithm is a generalization of 

Xyy Young's algorithm and analysis techniques for positive linear programs (Young, FOCS'01) to the semidefinite 

Q programming setting. 
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1 Introduction 



Semidefinite programming (SDP), alongside of linear programming, has been an important tool in approximation 
algorithms, optimization, and discrete mathematics. Very often, the goal is to efficiently find an approximation 
to the optimal solution of a given program. But like linear programs, SDPs are in general P-complete to solve 
exactly or to even approximate to any constant accuracy, suggesting that it is unlikely that there would be an NC 
or RNC algorithm for it. For this reason, we focus on parallel algorithms for finding approximate solutions to 
a particular class of semidefinite programs, known as positive SDPs [JY1 1]. These positive SDPs are a natural 
generalization of positive linear programs, which have been well studied in both sequential and parallel contexts 
(see, e.g., [LN93, PST95, GK98, YouOl, KY07, KY09]). 

A semidefinite program, in general, can be solved to an arbitrary additive error e using algorithms such 
as the Ellipsoid algorithm (with poly (Ye) convergence) or the interior-point methods (with poly (log -) conver- 
gence) [GLS93]. More recent efforts tend to focus on developing fast algorithms for finding a (l+e)-approximation 
to SDPs, leading to a series of nice sequential algorithms (e.g., [AHK05, AK07]). These iterative algorithms all 
use multiplicative weight updates methods and the number of iterations required depends on the so-called "width" 
parameter of the input program. In some instances, this width parameter can be as large as O(n), making it the 
bottleneck in the depth of direct parallelization. 

Most recently, Jain and Yao [JY1 1] studied an important class of SDPs known as positive SDPs and gave the 
first algorithm whose work and depth are independent of the width parameter (commonly known in the literature 
as width-independent algorithms). In this paper, we revisit their work and present a simpler and faster algorithm 
for solving positive SDPs to within a (1 + e)-factor of the optimal solutions. The input consists of an accuracy 
parameter e > and a positive semidefinite program (PSDP) in the following standard primal form: 

Minimize C • Y 

Subject to: Aj • Y > bi for i = 1, . . . , m (1.1) 
Y^ 0, 

where the matrices C , Ai , . . . , A m are n-by-n symmetric positive semidefinite matrices, and the scalars b\ , . . . , b m 
are non-negative reals. This is a subclass of SDPs, where in the general case, these matrices only have to be 
Hermitian and these scalars are only required to be real numbers. As with Jain and Yao, we assume that the input 
SDP has strong duality. The main theorem below quantifies the cost of our algorithm in terms of the number of 
iterations. The work and depth bounds implied by this theorem vary with the format of the input and how the 
matrix exponential are computed in each iteration. As we'll discuss below, with input given in a suitable — and 
natural — form, our algorithm runs in nearly-linear work and polylogarithmic depth. 

Theorem 1.1 (Main Theorem) Given a primal positive SDP involving n x n matrices with m constraints and 
an accuracy parameter e > 0, there is an algorithm approxPSDP that produces a (1 + e) -approximation in 
log 4 n log(~)) iterations, where each iteration involves only standard primitives on matrices and a special 
primitive that computes exp(3>) • A (3> and A are both positive semidefinite). 

1.1 Overview 

Both Jain and Yao's algorithm and our algorithm can be seen as a generalization of previous work on positive 
linear programs (positive LPs). Jain and Yao extended the algorithm of Luby and Nisan [LN93]. Their algorithm 
works directly on the primal program. Using the dual as guide, the update step is intricate as their analysis is based 
on carefully analyzing the eigenspaces of a particular matrix before and after each update. Unlike the algorithm of 
Jain and Yao, our solution relies, at the core, on an algorithm for solving the decision version of the dual program. 
We derive this core routine by generalizing the algorithm and analysis techniques of Young [YouOl], using the 
matrix multiplicative weights update (MMWU) mechanism in place of Young's "soft" min and max. This leads to 
a very simple algorithm: besides standard operations on (sparse) matrix, the only special primitive needed is the 
matrix dot product exp(3>) • A, where 3? and A are both positive semidefinite. 
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More specifically, our algorithm considers the following normalized primal/dual programs: 



Primal 



Dual 



Minimize Tr [Y] 
Subject to: A[ • Y > 1 
Y )? 



for i = 1, . . . , m 



Maximize 
Subject to: Yl 



i=l x i 




(1.2) 



x > 0. 



where I represents the identity matrix. This is without loss of generality, as it can be obtained by "dividing through" 
by C (see Appendix A). To solve this SDP, we design an algorithm for solving the decision version of it: Given a 
goal value o, either find a solution x G IR+ to (1.2-D) with objective at most (1 + e/2)o, or solution Y to (1.2-P) 
with objective at least (1 — e/2)d. By further scaling the Aj's, it suffices to only consider the case where o = 1. 
With this algorithm, the optimization version can be solved by binary searching on the objective a total of at most 
0(log(§)) iterations. 

Work and Depth. Similar to the sequential setting [AHK05, AK07], the majority of the cost of each iteraiton 
of our algorithm comes from computing matrix exponential, or more precisely the values of A^s • <&, where 
is some PSD matrix. The cost of our algorithm therefore depends on how the input is specified. When the 
input is given prefactored — that is, the n-by-n matrices Aj's are given as A,; = QiQj and the matrix C -1 / 2 is 
given, then Theorem 4.1 can be used to compute matrix exponential in 0(-^(n + M) log 2 Mlog( 1 /e)) work and 
0{j log 2 M log( 1 /e)) span, where M is the number of nonzero entries across Qj's and C -1 / 2 . This is because 
the matrix $ that we exponentiate has ||3>||2 < O(K), as shown in Lemma 3.7. 

Therefore, as a corollary to the main theorem, we have the following cost bounds: 

Corollary 1.2 The algorithm approxPSDP can be implemented in 0(n + M) work and 0(log ^ M) depth. 

If the input program is not given in this form, we can add a preprocessing step that factors each Aj into QjQ^ 
since Aj positive semidefinite. Often, these matrices have certain structure that makes it easier to factor (e.g., 
Laplacians). In general, this preprocessing requires at most (9(n 4 ) work and 0(log 3 n) depth using standard 
parallel QR factorization [JaJ92]. Similarly, we can factor and invert C in the same cost bound. We can often do 
better if C has some structure. 

2 Background and Notation 

In this section, we review notation and facts which will prove useful later in the paper. Throughout this paper, we 
use the notation 0(f(n)) to mean 0(/(n) poly log (/(n))). 

Matrices and Positive Semidefinite Matrices. Unless otherwise stated, we will deal with real symmetric 
matrices in M nxn . A symmetric matrix A is positive semidefinite, denoted by A ^ or ^ A, if for all x £ W 1 , 
x T Ax > 0. Equivalently, this means that all eigenvalues of A are non-negative and the matrix A can be written 

as 



where vi, V2, . . . , v n are the eigenvectors of A with eigenvalues Ai > • • • > A n , respectively. We will write 
Ai(A), A2(A), . . . , A n (A) to represent the eigenvalues of A in decreasing order. 

Notice that positive semidefiniteness induces a partial ordering on matrices. For this, we write A ^ B if 
B - A ^ 0. 

The trace of a matrix A, denoted Tr [A], is the sum of the matrix's diagonal entries: Tr [A] = J2i 
Alternatively, the trace of a matrix can be expressed as the sum of its eigenvalues, so Tr [A] = £V Aj(A). 
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Furthermore, we define 

A • B = Yl a 'j'' ; -j = Tr i AB i • 

It follows that A is positive semidefinite if and only if A • B > for all PSD B. 

Matrix Exponential. Given an n x n symmetric positive semidefinite matrix A and a function / : R — > M., we 
define 

n 

/(A)=^/(A i )v i v7, 
i=i 

where, again, Vj is the eigenvector corresponding to the eigenvalue Aj. It is not difficult to check that for exp(A), 
this definition coincides with exp(A) = Yli>o \ A *- 

Our algorithm relies on a matrix multiplicative weights algorithm, which can be summarized as follows. For a 
fixed so < \ and WW = I, we play a repeated "game" a number of times, where in iteration t = 1,2, ... , the 
following steps are performed: 

1. Produce a probability matrix pW = / Tr [W^] , 

2. Incur a gain matrix M", and 

3. Update the weight matrix as W"( <+1 ) = exp(eo 2~2t'<t 

Like in the standard setting of multiplicative weight algorithms, the gain matrix is chosen by an external party, 
possibly adversarially. In our algorithm, the gain matrix is chosen to reflect the step we make in the iteration. 
Arora and Kale [AK07] shows that this algorithm offers the following guarantees (restated for our setting): 

Theorem 2.1 ([AK07]) For e < \, ifM® 's are all PSD and =<; I, then after T iterations, 

(1 + £o )E M(<) ' P(t) ^ A -x ("EM®) - ^. (2.1) 

t=i \t=l J £ ° 



3 Parallel Algorithm for Positive SDPs 

In this section, we describe a parallel algorithm for solving the positive packing SDPs inspired by Young's 
algorithm for positive LPs. By binary searching on the objective, the problem can be reduced to 0(log(-)) 
iterations of the following decision problem: Given a goal value o, either find a solution x £ IR+ to (1.2-D) with 
objective at most (1 + e/2)o, or solution Y to (1.2-P) with objective at least (1 — e/2)o. By further scaling the 
Aj's, it suffices to only consider the case where o = 1. 
The main theorem for this section is the following: 

Theorem 3.1 There is an algorithm decisionPSDP that given a positive SDP, in O f lo ^ 4 n ^ iterations either 

outputs a solution to the simplified packing problem (1.2-D) with objective at most 1 + 0(e), or a solution to the 
simplified covering problem ( 1.2-P) with objective at least 1 — 0(e). 

Presented in Algorithm 3. 1 is an algorithm satisfying this theorem. Before we go about proving the theorem, 
let us look at the algorithm more closely. Fix an accuracy parameter e > 0. We set K to i(l + Inn). The reason 
for this setting of K is technical, but the motivation was so that we can absorb the In n term in Theorem 2. 1 and 
account for the contribution of the starting solution x^°\ 

The algorithm is a multiplicative weight updates algorithm, which proceeds in rounds. Initially, we work with 
the starting solution xf 1 = -Aj. This solution is chosen to be small enough that £\ x- ^ Aj ^ I but contains 
enough mass that subsequent updates, which is a multiple of the current solution, are guaranteed to make rapid 
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Algorithm 3.1 Parallel Packing SDP algorithm 



Let A" = i(l + lnn). 



Initialize *(°) = Y™=i 4 
For t = 1, . . . , while 1 T x < K 

1. Let WW = exp(*(*- 1 )). 

2. Let p be such that (1 + e) p - 1 < Tr [W®] < (1 + e) p . 

3. Let BW ={ie [n] : W« • A; < (1 + e) p+1 }. 

4. If B^> is empty, return "infeasible" and W^. 

5. Let 6® = a • x^~ X \ where a = min{e/||a;^ 1 ' ) ||i, e/(i+iOe)/f}. 

6. Update a;W = z^ 1 ) + <*(*) and = + tfW Aj 



progress. In each iteration, we compute W' 1 ' = exp(\lK* _1 )), where = ]T\ x 4 1 Aj. (For some intuitions 

for the W^) matrix, we refer the reader to [AK07, Kal07]. 

The next two lines of the algorithm is responsible for identifying which coordinates of x will be updated. 
For starters, it may help to think of these lines as follows: let = / Tr [W^] — and flw be simply 
{i G [n] : • A, < 1 + e}. The algorithm discretizes Tr W] to ensure certain monotonicity properties on 
B®. As we show later on in Lemma 3.2, the set flw cannot be empty unless the system is infeasible. The final 
steps of the algorithm updates the values of for coordinates in multiplicatively. 



3.1 Analysis 

To bound the approximation guarantees and the cost of this algorithm, we will rely on the following lemmas and 
claims that bound the spectrum of i Sf^ t \ Before we begin, we will need some notation and definition. An easy 
induction gives that the quantities that we track across the iterations satisfy the following relationships: 



x 



(t) 



T=l 

W« = exp(*( i - 1 )) 
P (t) <^ f w w /Tr |w W 



Tr 



p(*) = Tr [wC / Tr W {t) = Tr / Tr 



def 1 



i=l 



(*) 



when t > 



(3.1) 

(3.2) 
(3.3) 
(3.4) 

(3.5) 
(3.6) 



i=l t=0 

First, we show that i?W can never be empty unless the system is infeasible: 
Lemma 3.2 (Feasibility) If there is an iteration t such that B^ is empty, then there does not exist x such that 

1. l T x > (l + e)K 

2. Zti * IK 
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Proof: Note first that Tr [P^] = 1. Furthermore, the fact that is empty means that for alH = 1, . . . , n, 
WW • Ai > (1 + But we know that Tr [W^] > (1 + ep-\ so 

w(t) 

Tr [WW] " v ; 

Therefore, Pw is a valid dual solution with objective at most 1, which in turn means no primal solution with 
objective more than 1 exists. ■ 

We now analyze the initial solution: 
Claim 3.3 

A m a X (* (0) )=A max ^xS 0) A^ <1 

Proof: Our choice of x^ guarantees that for alH = 1, . . . , n, 

J°) A . - 1 a- -< -I 

n Tr [AjJ n 

Summing across i = 1, . . . , n gives the desired bound. ■ 

Note that the claim implies that the starting solution satisfies x^ Aj I IK — but it may not yet be 
a feasible solution for the whole system because it does not satisfy l T x^ > K. In the following lemma, we 
bound the l\ norm of the solution vector at every intermediate step. Notice that the initial solution may not satisfy 
< K + e, but if this happens, the algorithm will stop right away and the initial solution is 
trivially a feasible solution (since already xf^ Aj =^ IK). 

Lemma 3.4 Let T be the iteration in which the algorithm terminates. For t = 1, . . . ,T, 

l T s (t) < K + e. 

Proof: Since T is the iteration when the algorithm terminate and for all t > 0, € , we have 

1 T *M < l T x^ + 1 T 5^ 
<K + 1 T 5^ 



Furthermore, we know that l T x^ = J\ l/(nTr [A;]). But by our choice of a, we know that a < e/\\x 
and so 1 T J( T ) = ||^ T )||i < e, so l T x^ < K + e. 



(t-i)i 



Spectrum Bound. Finally, we bound the spectrum of our solutions: 

Lemma 3.5 (Spectrum Bound) For t = 0, . . . , T, where T is the final iteration, 

n 

*W = ^ xf )A i i 1 + We)KI. (3.7) 

i=i 

We prove this lemma by resorting to Theorem 2.1, which relates the final spectral values to the "gain" derived 
at each intermediate step. For this, we show a claim that quantifies the gain we get in each step as a function of the 
i'l-norm of the change we make in that step. 
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Claim 3.6 For all t = 1, . . . , T, 

M (*).p(t)< (1±£)!. \\ S (t)\\ v ( 3. 8) 

Proof: Consider that 

i / n \ 

(t) 



MW.pW = i^5f ) A^.P 



Every i £ B has the property that 

Aim WW < (1 + e) p+1 



So then, since 



we have 



and thus 



wW WW 

P W = =5! 



Tr[WW] " (l + ey- 1 ' 
Ai»pW < (1 + e) 2 

MW.p( f )<ifof(l + £ ) 2 | 
6 Vies / 



Proof of Lemma 3.5: We can rewrite as 

n t n n t 

*w = e xfA, + E E ^ (T) A * = E *1 0) A * + - E M(T) > 

i=l r=l i=l j=l t=1 

SO 



A m ax(* W ) < A max J>! Ai +e • A max 



\i=l / \r=l / 

since both sums yield positive semidefinite matrices. 

By Claim 3.3, we know that the first term is at most 1. To bound the second term, we again apply Theorem 2.1, 
which we restate below: 

t / t \ , 

mn 



(1 + e) E M(T) * p(T) ^ A — ( E M 

Rearranging terms, we have 



T=\ \t=1 



A max ^ M(T) ) <(i+ £ )E M(r) * p(r) + 1 ^- 
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We also want to make sure that each M( T ) satisfies M( T ) ^ I. With an easy induction on t, we can show that (3.7) 
holds for all r < t — 1. This means that for r = 1, . . . , t, each M^ r ^ satisfies 

i=l i=l i=l 

since a < z/(i+10e)k. It then follow from Claim 3.6 that 

e ■ A max M (T) < (1 + e) ^(1 + ef ■ 1 T S^ + Inn, 

\T=1 / T=l 

which means 

e • A max (j^ M(T) ^) < In n + (1 + e) 4 ^ 

because l T x^ < K + e < (1 + 

Combining these altogether, we get 

A m ax(* W ) < 1 + Inn + (1 + efK < eK + (1 + e) 4 K < (1 + 10e)K. 



3.2 Cost 

Similar to Young's analysis, our analysis relies on the notion of phases, grouping together iterations with similar 
W^) matrices into a phase in a way that ensures the existence of a coordinate i within it with the property that this 
coordinate is incremented (i.e., of > 0) by a significant amount in every iteration of this phase. To this end, we 
say that an iteration t belongs to phase p if and only if (1 + e) p_1 < Tr [W^^] < (1 + e) p . A phase ends when 
the algorithm terminates or the next iteration belongs to a different phase. 

Almost immediate from this definition is a bound on the number of phases: 

Lemma 3.7 The total number of phases is at most 0(K/e). 

Proof: On the one hand, we have =^ so Tr [W^ )] > n • e° = n. On the other hand, we know that 

* (T) ^ (1 + 0(e))K, so Tr [W^] < n exp ((1 + 0(e))K). This means that the total of number is phases is at 
most 



Tr [W( T )1 

log 1+e Jr | w(0) j < log 1+e exp ((1 + 0{e))K) (3.9) 
< ^(l + 0(e))K = o(*pJ (3.10) 



To bound the total number of steps, we'll analyze the number of iterations within a phase. For this, we'll need 
a couple of claims. The first claim shows that if a coordinate is incremented at the end of a phase, it must have 
been incremented in every iteration of this phase. 

Claim 3.8 If i £ then for all t' < t belonging to the same phase as t, i £ \ 
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Proof: Suppose t belongs to phase p. As i G B", we know that W' 1 ' • A, < (1 + e) p+1 . Since t' < t we have 

£ M M (3.11) 

i'<r<i 
t'<T<t i 

Therefore W(*') =<! and that 

W {t,) • At < W w • At < (1 + e) p+1 (3.13) 
Which means that i £ B^'\ as desired. ■ 

In the second claim, we'll place a bounding box around each coordinate xf of our solution vectors. This 
turns out to be an important machinery in bounding the number of iterations required by the algorithm. 

Claim 3.9 (Bounding Box) For all index i, at any iteration t, 

xf < (1 + 0(e))n 2 K/Tr [Ai]xi 0) 

Proof: Recall that our initial solution sets xf 1 = l/(nTr [Aj]). To argue an upper bound on xf , note that 
Lemma 3.5 gives 

* w ^ (1 + 0{e))lK (3.14) 
Since Y^j=i x f' '-^-j = m ^ eacn Aj is positive semidefinite, we have 



Tr 



x^A, 



< Tr 



*(*) 



< (1 + 0(e))nif (3.15) 



We conclude that xf> < (1 + 0(£))n 2 i<7 Tr [A;] xf\ as claimed. ■ 

The final claim shows that each iteration makes significant progress in incrementing the solution. 
Claim 3.10 In each iteration, either 1 T 5^ = e or a > Q(e/K). 

Proof: We chose a to be min{e/||cc^ ^ ||i, £ /(i+Ws)k}. If a = e/\\xg \\i, then = e and we are done. 

Otherwise, we have a = £ /(i+We)K, which is Q(e/K). ■ 

Corollary 3.11 The number of iterations per phase is at most 

f K 

O ( — In (nK) 

and the total number of iterations is at most: 

' log 3 n 



O 



e 4 



Proof: Consider a phase of the algorithm, and let / be the final iteration of this phase. By Claim 3.10, each 
iteration t causes = e or a > Q(e/K). Since 1 T < K + e for alH < T, the number of iterations in 

which 1 T ^W = e can be at most 0{K/e). Now, let i E B^' be a coordinate that got incremented in the final 
iteration of this phase. By Claim 3.8, this coordinate got incremented in every iteration of this phase. Therefore, 
the number of iterations within this phase in which a > VL{e/K) is at most 

logi + n (£ /x)K (/) /^ (0) ) < log 1+n(£/K) ((1 + 0(£)n 2 K) =o(jhx ((1 + O(£))i0) • (3-16) 
Combining with Lemma 3.7 and the setting of K = O ( ) gives the overall bound. ■ 
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4 Evaluation of exp (<&) • A* 

We show the following result about computing the matrix dot product of a positive semidefinite matrix and the 
matrix exponential of another positive semidefinite matrix. 

Theorem 4.1 Suppose has p non-zero entries, ||<&||2 < k, and the n-by-n matrices Ai's are given as Aj = 
QjQ7> where the total number of nonzeros across Qj 's is q. Then, there is an algorithm bigDotExp(<&, { Aj = 
QiQj}?=i) that computes exp (<&) • Aj upto afactor-(l ± e) approximation in 0(Klognlog( 1 /e) + log log n) 
depth and 0(^(K,log( 1 /e)p + q) logn) work. 

The idea behind proving Theorem 4. 1 is to approximate the matrix exponential using a low degree polynomial, 
as evaluating matrix exponentials exactly is costly. For this, we will apply the following lemma, reproduced from 
Lemma 6 in [AK07]: 

Lemma 4.2 ([AK07]) 7/B is a PSD matrix such that ||B||2 < n, then the operator 



B= 7T B< where k = max{e 2 K, ln(2e -1 )} 

0<i<fe % ' 

satisfies 

(1 - e) exp (B) < B < exp (B). 

Proof of Theorem 4.1: The given factorization of each Aj allows us to write exp (<&) • Aj as the 2-norm of a 
vector: 



exp(*)»Aj=Tr exp(*)QjQ 



T 



Tr 



Q7exp(|*)exp(i*)Q, 



(4.1) 
(4.2) 



=||exp(±*)Qi|| 2 (4.3) 

By Lemma 4.2, it suffices to evaluate B • Aj where B is an approximation to B = exp^*). To further 
reduce the work, we can apply the Johnson-Lindenstrauss transformation to reduce the length of the vectors to 
O(logn); specifically, we find a 0(p logn) x n Gaussian matrix II and evaluate 

||nBQj|| 2 (4.4) 

Since II only has logn) rows, we can compute IIB using O(logn) evaluations of B. The work/depth 
bounds follow from doing each of the evaluations of BIT, where ilj denotes the i-th column of II, and matrix- 
vector multiplies involving $ in parallel. ■ 
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A Normalized Positive SDPs 

This is the same transformation that Jain and Yao presented [JY1 1]; we only present it here for easy reference. 

Consider the primal program in (1.1). It suffices to show that it can be transformed into the following program 
without changing the optimal value: 

Minimize Tr [Z] 

Subject to: B; • Z > 1 fori = l,...,m (A.l) 
Z^ 0, 

We can make the following assumptions without loss of generality: First, bi > for all i = 1, . . . , m because 
if bi were 0, we could have thrown it away. Second, all Aj's are the support of C, or otherwise we know that the 
corresponding dual variable must be set to and therefore can be removed right away. Therefore, we will treat C 
as having a full-rank, allowing us to define 

B def 1 C -l/2 A . c -l/2 

It is not hard to verify that the normalized program (A.l) has the same optimal value as the original SDP (1.1). 
Note that if we're given factorization of Aj into QjQ^, then Bj can also be factorized as: 

Bi = i(C- 1 / 2 Q 4 )(C- 1 / 2 Q i ) T 
Futhermore, it can be checked that the dual of the normalized program is the same as the dual in Equation 1 .2. 



B Proofs about Matrix Multiplicative Weights Update Algorithm 

We give a proof of Theorem 2.1 for completeness. First, we state a fact which is easy to verify: 
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Fact B.l If x is a scalar such that < x < e, then: 

1 + x < e x < 1 + (1 + 2e)x 

and this also generalizes to P.S.D. matrices A when H A ^ el: 

I + A < exp A < I + (1 + 2e) A 

Since in general AB 7^ BA, we cannot expect exp(A + B) = exp(A) exp(B). But we can relate their trace 
values using Golden- Thompson inequality: 

Lemma B.2 (Golden-Thompson inequality) If A and B are PSD matrices, then 

Tr [exp(A + B)] < Tr [exp(A) exp(B)] 

Proof of Theorem 2.1: 



Tr 



= Tr 
= Tr 

< Tr 
= Tr 

< Tr 
= Tr 
= Tr 

< Tr 



exp (*®) 

exp^^-^+eMW; 
exp(* (t " 1) )exp(eM ( * ) ) 
W w exp(eM ( ' ) ) 
W (t) (I + e(l + 2e)M w ) 
W w ] + e(l + 2e)wW • 
W('»| (l + e(l + 2e)pW«MW) 
W (t) l exp(e(l + 2e)pW . M^) 



(B.l) 

(B.2) 

(B.3) 

(B.4) 

(B.5) 

(B.6) 

(B.7) 

(B.8) 
(B.9) 



An easy induction with the base case that Tr [W°l = Tr [I] = n then gives: 



Tr 



exp(* (T) ) > nexp(^e(l + 2e)P w • M (t) ) 
t=i 

Using the fact that Tr [exp(A)] > exp(A max (A)) and taking logs of both sides gives: 



(B.10) 



A max (* (T) )) > Inn + e(l + 2e)P W • M« 



t=i 



Dividing both sides by e and substituting in \1/( T ))) = J2t=i gives the desired result. 



(B.ll) 
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